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1. INTRODUCTION 

Slope failures can cause environmental, economic, psychological, property damage and even result 
in the loss of human life. That is why in most civil engineering projects, site stability has been considered a 
major problem. Unfortunately, even the soil erosion being one of the major causes of this kind of instability, 
it's often neglected, showing it results in the long-term [1]. 

Soil erosion, by definition, is a geomorphic and, at the same time, a land degradation process that 
detaches soil particles, rock fragments, soil aggregates and organic matter from its primary location and then 
transports these to another location. Being affected, and many times intensified, by human intervention, the 
natural erosion processes have presented relevant increases in their rates of occurrence across all kinds of 
landscapes [2], [3]. This kind of land degradation is the major cause of slope instability and it occurs when 
the soil resistance drops. This decrease is mainly caused by: reduction of matrix suction stress, discontinuities 
(faults, joints and fractures), modification of the structure of sensitive soils, liquefaction of fine saturated 
sand and loss of cohesion [4]. 

This paper provides a computer vision approach for slope structural damage detection, as it seeks to 
interpret and understand the visual world. The apparent damage on slopes is the most relevant object of this 
study, as erosions, discontinuities, animal dens and anthills are threats to its structural integrity. Since the 
damages caused to slope structures are, at some point, apparent, image analysis and anomaly detection 
through convolutional neural networks (CNNs) arises as a great option for damage identification, given that 
such occurrences are multifactorial, some of the failure modes are detected only visually, which justifies the 
use of computer vision techniques on those scenarios. 
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To address all the aforementioned issues, this article proposes a 3-layer CNN with 2 fully-connected 
(FC) layers to detect apparent structural damage on slopes and classify if it is stable, given the visual 
diagnosis of it. This kind of neural network enables the early detection of slippage and erosion occurrences, 
preventing accidents, maintaining the integrity of the slope and the environment in which it is found. A 
CNN-based image analysis and classification software was preferable due to the ease of merging it with 
existing monitoring systems such as drones, satellites and any other type of system that generates images, 
enhancing its precision, objectivity and processing speed [5]. 

The following sections are organized as: section 2 presents the related works alongside with the 
concepts of the main technologies and their role for the optimal functioning of the proposed approach, 
addressing their pros and cons as well as their current state. Section 3 describes the proposed network and 
how it works. Section 4 performance analysis of the developed CNN, highlighting the training and test details 
that presented the best result for the detection of slope instability. Finally, section 5 concludes the paper with 
the analysis results and improvements that this proposal needs to undergo. 


2. RELATED WORKS AND BACKGROUND CONCEPTS 

This section will present a brief literature review, including several researches and proposals about 
image detection artificial neural networks (ANNs) as well as slope stability prediction methods, as described 
on Table 1. For the development of this paper, the bibliographic research revolved around well-founded 
articles that would serve as the basis for new ANN implementations. Obviously, this paper seeks to merge 
the main themes and technologies of each and every article read to propose the most suitable architecture so 
that the best performance is achieved by the neural network. 

Among the studied proposal, the following stand out: Yang ef al. [6] contributed with an innovative 
proposal, training and testing deep learning (DL) models using the you only look once (YOLO) approach to 
analyze the synthetic data obtained from a graphical model of bolt loosening marks at wind turbine towers 
and achieving an accuracy of 95.71% with a detection time of 0.024 seconds for a single image using the 
YOLO v3. Liu et al. [7] presented an innovative method to improve the analysis process, denominated finite 
element (FELEM), slope stability analysis using elastic finite elements that can be analyzed by DL methods, 
such as CNN, but requires an extensive and reliable dataset to enable this approach. 


Table 1. Used base related works 


Studied Covered technologies Research summary 
research Slope stability Deep CNN ReLU Batch 
detection learning normalization 
Yang et al. x V V Xx V Innovative proposal using YOLO v3 to 
[6] achieve a fast and precise neural network for 
bolt loosening detection for wind turbine 
towers. 
Liu et al. V x x x x A two-dimensional (2D) and 3D slope 
[7] stability FELEM using elastic finite element 
stress fields was proposed. 
Qi and V V x Xx V Proposed an excellent comparison between 
Tang [8] six different machine learning (ML) 
algorithms. 
Lu et al. [9] x V V V x Tested different types of CNN architectures 


and datasets to find the optimal plant leaf 
disease classification method. 


Kattenborn x V \ V V Well explained neural network model for 
et al. [10] vegetation classification. 

Yadav and x V V Xx V Disease diagnosis using a CNN based 

Jadhav [11] architecture using different parameters and 


techniques to acquire the best performance. 


Qi and Tang [8] proposed an amazing comparison between six ML algorithms, including logistic 
regression (LR), decision tree (DT), random forest (RF), gradient boosting machine (GBM), support vector 
machine (SVM), and multilayer perceptron neural network (MLPNN). Lu ef al. [9], highlights the best 
scenarios for a CNN to be able to perform in a satisfactory way when it comes to image classification. And 
works like the one presented by Kattenborn ef al. [10], a review on CNN in vegetation remote sensing, 
complements it. So that ideas like the one in this paper emerge, since these types of applications can arise in 
the most diverse areas. 

Similar CNN solutions [12]—[18] differ from the one proposed in this paper given that the developed 
architecture, combined with the technologies presented in the following section, had better results and 
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achieved greater performance when compared to the most common ones in the current literature. This is due 
to the need of multiple convolutional layers to properly process the available slope dataset and enable the 
system to be easily combined with already existing monitoring systems, using the images obtained by them 
(drones and satellites) to extract information regarding the visual integrity of the slope. 

This work seeks to explore use of computer vision in conjunction with high-performance processing 
techniques, such as CNN, to detect damage in geotechnical structures, since it is essential to the reliability of 
these kinds of structures and others that depend on them. The next subtopics will provide the definitions and 
essential concepts needed for the solution proposed by this paper to be fully understood. They will be 
ordered: slope stability, DL for image classification, CNN particularities, activation function rectified linear 
unit (ReLU), Adam optimizer and batch normalization. 


2.1. Slope stability 

Slope stability assessment is crucial for geotechnics and civil engineering, as artificial slopes and the 
design of dams, embankments, and similar structures directly impact the ecosystem. Slopes are terrains that 
serve as a support base for the soil, and may have a natural or artificial origin (man-made). Non-natural 
slopes consist mostly of embankments built to prevent landslides in places where their stability is not 
efficient [19]. The topography elevation, aspect and angle of a slope are the most important factors in 
geomorphology, which directly influences slope stability. That being, the risk of landslides is higher in more 
steep slopes, where it is advised not to exceed 45°, since greater variations cannot guarantee its stability [20]. 

In order to resist the pressure generated by the earth, this type of containment is chosen, as it 
improves the soil through the execution of tie rods, anchors, shotcrete and drainage. There are several 
techniques that aim to protect a slope, and the choice of which one to use depends on the type of project 
designed. This protection can be made of stone coating, concrete, retaining wall, slope berm, anchorage, 
among other measures. Depending on the length of the slope, it is advisable to build contour lines to avoid 
erosion caused by rain. Another method of preservation is the use of vegetation to cover the slopes, providing 
greater stability [20], [21]. 

In this paper, slope instability is considered as the removal of the vegetation cover, soils and 
underlying loose material. An instability can be classified into two categories: landslides and erosion. The 
first one represents the gravitational mass movements that occur when the stress exceeds the mechanical 
resistance of a slope, and the second concept includes the removal of the vegetation and/or topsoil caused by 
different types of erosion [22]. 


2.2. Deep learning for image classification 

DL is a concept that derives from the conventional neural network, but outperforms it by employing 
transformations and graph technologies simultaneously, becoming a multi-layer network. It outperforms 
other types of ML architectures and is able to process audio, images and natural language, among others [23]. 
Between all types of deep neural network models, when it comes to image processing, three stand out. These 
are: deep belief networks (DBNs), stacked autoencoders (SAEs) and CNNs [24]. 

For this proposal, a custom CNN based architecture is created because of the advantages that this 
type of neural network adds when it comes to image classification. Their structure was inspired by the actual 
operation of the vision itself, and has become a successful tool in computer vision and state-of-art models of 
neural activity and visual tasks. They start their process by convolving a set of filters with the input and 
rectifying the outputs, leading to “feature maps”, akin to the planes of S-cells in the neocognitron [25]. 

Basically, the convolution layer is responsible for convolving the image patches. Then, the pooling 
layer resizes the feature maps that resulted from the previous process, to get more abstract and universal 
features. And for the last step, these maps are transformed into vectors by the fully connected layer [26]. A 
representative structure of CNNs is shown in Figure 1, containing two convolutional-pooling layers and a 
fully-connected one. 


2.3. CNN particularities 

To decrease the amount of weight connections, and optimize the training/test step, a more efficient 
method emerged, it proposed to look for local regions on images instead of looking at each pixel. So, the 
hidden neurons in the next layer only get inputs from the corresponding part of the previous layer. For 
example, it can only be connected to 5x5 neurons. Thus, if we have 64x64 neurons in the next layer, then it 
will become 5x5x3 by 64x64 connections, which is 43,200 (instead of 50.331.648 to fully connect it). To 
further simplify the neural network connections, we can set the weights fixed for all neurons in the next layer, 
connecting neighboring neurons with the same weight they had for the region analyzed in the previous layer. 
Therefore, the parameters would undergo another significant reduction, resulting in only 5x5x3=75 to 
connect 64x64x3 neurons to 64x64 in the next layer (from 50.331.648 to 75 connections) [27]-[29]. 
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As represented in Figure 2, the convolutional layer uses filters that perform convolution operations 
as it is scanning the input. These operations are performed on the entire image, by sliding and finding the dot 
product between the filter and the input image parts. The output of this operation is termed Feature Map, 
which provides the corners and edges information about the image and is read by other layers, so they can 
learn the remaining features of the image [29]. 

Commonly, images have a series of redundant information, therefore, the need to use the pooling 
layers inserted in several of the convolutional layers of the network, to avoid substantial performance 
degradation [30]. The best pooling technique to work with representations that rely on count statistics, such 
as bag-of-visual-words (BOV) ones has shown to be the one called max-pooling, because it reduces the size 
of the hidden layers by an integer multiplicative factor, improving performance in applications that process 
many images [31]-[33]. Max-pooling creates position invariance over larger local regions and down-samples 
the input image by a factor of K_x and K_y along each direction. Leading to a faster convergence rate by 
selecting superior invariant features which improves generalization performance [34], [35]. 


Fully Connected 
Convolutional Pooling Convolutional Pooling Layer 
Layer Layer Layer Layer Output 


Input (Image) Layer 


Figure 1. Illustrative CNN structure with two convolutional-pooling layers and one fully connected 
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Figure 2. Convolutional layer flowchart 
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As for the last major particularity, we have the FC layer. It is known that the features generated by 
the last convolutional layer corresponds to a portion of the input, so, the network’s perspective view does not 
cover the entire spatial dimension of the image. This lack of coverage makes the FC layer a mandatory part 
of the network. As for its parameters, such as the number of FC layers and number of neurons required in FC 
layers varies for each scenario [36]. 


2.4. Activation layer ReLU 

Activation functions are commonly used after each convolution layer in order to achieve a certain 
linearity at the neuron's output, since image data are not linearly separable [37]. ReLU is the most used 
activation function in the world, both for CNNs and DL, it is the most efficient. As much as it appears to be a 
linear function, it is not, as it has a derivative function that allows backpropagation. However, when the input 
approaches zero or less, the gradient of the function becomes zero, preventing the network from performing 
backpropagation [38]. 

In practice, the outputs of the filters are submitted to the activation function at the end of each and 
every convolutional layer, and after going through this process, used to update the neural network weights 
[39]. By observing the mathematical expression of the ReLU function, f(x) = max(0, x), it can be seen that 
the neurons will only activate if the input is greater than zero, so, neurons that receive negative values will be 
“erased”. This particularity increases the training speed and reduces the computational cost, however, if a 
neuron receives only negative values, it will not learn anything. 


2.5. Adam optimizer 

An adaptative optimization algorithm exists to find the best weights for a neural network, aiming for 
the minimization of the error function (the closest do zero, the best), reflecting in the reduction of the error of 
the network as a whole. The adaptive moment estimation (ADAM) combines the best properties of the 
AdaGrad and root mean square propagation (RMSProp) algorithms to provide an optimization algorithm that 
handles noisy problems [40]. It was invented by Kingma and Ba [41] and is one of the most popular step size 
methods in the area of neural networks. It converges much faster for multi-layer neural networks or CNNs, 
than any other optimizer, but it is not quite as good for generalization [42]. 


2.6. Batch normalization 

The batch normalization (BN) allows the hyper parameters to be more freely defined, as it 
significantly reduces training time by normalizing the input of each layer in the network, and not only the 
input layer itself. This approach allows the use of higher learning rates, reducing the number of training steps 
[43]. These advantages make batch normalization a natural candidate to speed up training of different 
combinations of hyperparameters needed to optimize the use of dropout layers, as it makes the network 
converge faster [43]. During training, BN estimates the mean and variance of the entire activations within a 
mini-batch through exponential moving average with update factor, and in the testing phase, it uses those 
values for whitening input activations [44]. 


3. PROPOSED APPROACH 

This paper proposes a 3-layer CNN architecture that detects different types of erosion and 
landslides, adding redundancy to the inspection process and often saving the displacement of professionals. 
On the other hand, it does not provide the calculation of the factor of safety, meaning that the slope stability 
assessment can’t be proceeded by this method. It was not possible to perform this calculation due to the 
unavailability of a robust dataset that ensures data labeling for all the needed scenarios. 

The dataset analyzed in this paper consists of a total of 300 images, 200 of them are for erosion 
images and 100 for slope structures without any visible damage, considered stable (without visible erosion or 
landslide). The distribution of the images has the following proportion: 73% for training (219 images), 17% 
for validation (51 images) and 10% for testing (30 images). The validation dataset differs from the test 
dataset because it is used to give an unbiased estimate of the skill of the final tuned model while tunning the 
hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into 
the model configuration [45]. 

Figures 3 and Figure 4 represent the types of photos used by the network to learn how to identify an 
erosion and a stable slope. These illustrations are shown Figure 3 and Figure 4 just to clarify what is meant 
by "Erosion" and "Stable Slope" mentioned in this paperfor. The first term generalizes apparent damage that 
compromises or may come to compromise the slope structure, while the second one includes all kinds of 
minor occurrences that are considered irrelevant (like the holes dug for planting at Figure 4). 
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Figure 3. Example of an image for erosion [46] Figure 4. Example of an image for a stable slope [47] 


As shown in Figure 5, this paper proposes a custom CNN with 3 convolutional layers, The first and 
second ones have 32 filters of size 3 (height and width), each one of them having their output submitted to a 
2x2 max pooling layer. As for the output of the third, and last, convolutional layer, it is noticed an addition of 
the flatten layer to the max pooling one (present on the first two convolutional layers). This flatten layer is 
responsible for reshaping a 4-dimension output into 2D, so the fully connected layers can utilize their 
neurons alongside with the ReLU function to extract useful information. 
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Figure 5. Custom CNN structure that got the best performance 


For this application, it would not be interesting for the winning class to be defined by the highest 
numerical value. The output of the second fully connected layers uses the Softmax function to estimate the 
probability that each input has to belong to each of the two possible output classes (erosion and stable) with 
an interval between 0 and 1. Consequently, for each input submitted to the neural network, it returns the 
probability of having an instability on it, as well as the probability that the structure being stable. 


4. PERFORMANCE ANALYSIS 

Since the neural network flowchart developed for this application was explained in previous 
sections, this one brings more specifics details of the CNN used to solve the previously exposed issue. 
Among these details, there are: hyperparameters, convolutional weights, confusion matrix and overall 
performance, as some of them are shown in Table 2. To achieve the results discussed in this paper, the 
following environment setup was used: i) hardware: Windows 10 Home x64, 8 GB RAM and Intel 17-8750 H 
(2.21GHz); ii) programming language: Python 3.4.12; iii) web-based interactive computing platform: Project 
Jupyter; and iv) Main Python Libraries: TensorFlow 2.8.0, Pandas 1.4.1, NumPy 1.22.3, Matplotlib 3.5.1 and 
Scikit-learn 1.0.2. 

Looking at Table 2, we can see that, in fact, the architecture proposed in this paper achieved the best 
performance. It is worth noting that each combined architecture and its respective hyperparameters were 
tested at least four times. This quantity of tests was carried out so that the random factor of the weights, and 
training, did not considerably affect the overview of each of the architectures. 
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Table 2. Comparison between different architectures 


Architecture Learning Epochs/ Numberofneuron Batch Performance for the Performance for 
rate iterations son the FC layers size validation dataset the test dataset 
3 convolutional layers + le-4 96/7000 128 6 86.3% 29/30 


max pooling after each of 
them + 2 FC layers 
4 convolutional layers + le-4 96/7000 128 6 84.31% 25/30 
max pooling after each of 
them + | FC layers 
3 convolutional layers + 3e-4 96/7000 128 6 82.35% 25/30 
max pooling after each of 
them + 2 FC layers 
3 convolutional layers + 0.001 96/7000 128 6 80.39% 24/30 
average pooling after each 
of them + 2 FC layers 
2 convolutional layers + 0.01 96/7000 64 6 70.58% 18/30 
average pooling after each 
of them + | FC layer 
2 convolutional layers + 0.001 96/7000 64 6 66.66% 18/30 
max pooling after each of 
them + 1 FC layer 


The batch size was set to 6 due for a better split of the image samples from the slope dataset as well 
as the decrease of the computational cost. The use of 128 neurons in the architecture tests is preferable due to 
the number of pixels each image had after the reshape (128x128), which is justified, since in cases where 64 
neurons were used, the performance dropped considerably. It is worth mentioning that all the tests used 
Adam optimizer, as it has been the best optimizer option for deep learning scenarios like the one proposed in 
this paper. As expected, the CNN success rate for the validation dataset and the test dataset are similar in 
most cases, even though both these datasets had their samples randomly separated from the 300 original 
images. To better understand the weights and outputs of each convolution layer, the original inputs are 
needed, those being the images analyzed by the proposed neural network. Figure 6 is an example of an input 
(landslide) that was already resized to 128x128 pixels and then analyzed by the CNN. Figures 7 and Figure 8 
are the interpretation of the neural network for the submitted example in Figure 6, respectively representing 
the weights and filters as well as the feature maps that were generated by the first convolutional layer. 


Figure 6. Example of an 128x128 image from the test dataset [48] 


The examination of Figure 7 shows that the first convolutional layer traces the most relevant 
sections of the given example in Figure 6 according to the relevance presented by the filters during the 
mapping process. The red values have positive values and contribute to the interpretation of the input. On the 
other hand, the blue ones are negative, which means that the corresponding area of the image has a low 
relevance level to the definition of the final answer. As previously mentioned, 32 3x3 filters in Figure 8 were 
applied during the convolutional operations of the network. This filtering results in 32 different activation 
maps, or neuron matrices (the output itself). The analysis of the feature maps generated on each 
convolutional layer, seeks to understand which features the CNN detects, in this case, highlights with 
shading. For the last convolutional layer, the network achieves the 64 3x3 filters, predefined to enable the 
CNN to map each image in a more embracing way. As it is noticed, for the same input, the filters and feature 
maps of the third, and last, convolutional layer in Figures 9 and Figure 10 are more accurate, since they have 
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already undergone the adjustments of all the previous convolutions and pooling operations. Therefore, the 
shadow effects better distinguish the limits of the slope and its instabilities. 

Using the parameters exposed in this section, clarifies, through the confusion matrix, that the 
network was able to correctly identify 32 images that presented instability, of the 37 available on the 
validation dataset, and 12 out of 14 stable slopes, as shown Figure 11. As for the test dataset, 29 out of 30 
images were precisely classified, where a third of them were representations of stable slopes and the rest of 
the images were split between erosion and landslide occurrences on a slope structure. 
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Figure 7. Filters of the first convolutional layer Figure 8. Feature maps generated by the first 
convolutional layer 
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Figure 9. Filters of the third convolutional layer Figure 10. Feature maps generated by the third 
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Figure 11. Confusion matrix of the best performance obtained 
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5. CONCLUSIONS 

Multiple convolutional layers are recommended when it comes to CNNs for image classification. In 
this paper, the performance of the neural network decreased or did not show significant gains when more 
than 3 layers were used. The combination of a relatively low learning rate with the max pooling technique 
showed significant improvements, mainly because the network analyzed 128x128 px images. Even with the 
database being lacking in more images that faithfully represents slope structures and possible instabilities in 
them, the developed CNN obtained very satisfactory results, showing itself to be promising for the analysis 
of different types of structures and needs of the structure health monitoring (SHM) area. Among the main 
difficulties that were faced during the development of this paper, the following stand out: lack of a greater 
variety of images and achieving better results given the hardware limitations. 


6. FUTURE WORKS 

As future works, the main goal is to train this exact CNN architecture with a more robust dataset, 
that being with more than 2,000 slope and erosion images, to verify if its performance remains satisfactory or 
changes would be needed so that the network could become more accurate. Even though it is possible to 
develop a CNN application to analyze the progressive failure process of slopes, it would require an extensive 
dataset with several images of the same slope at different angles and timestamps, so that the factor of safety 
calculation could be accurate. Another good suggestion is to merge this CNN with an Autoencoder, so that 
the software would be able to predict and simulate how each kind of instability affects a slope structure over 
time, and not just detect the already existing ones. 
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